High-dimensional regression with potential prior information on variable importance

نویسندگان

چکیده

Abstract There are a variety of settings where vague prior information may be available on the importance predictors in high-dimensional regression settings. Examples include ordering variables offered by their empirical variances (which is typically discarded through standardisation), lag when fitting autoregressive models time series settings, or level missingness variables. Whilst such orderings not match true variables, we argue that there little to lost, and potentially much gained, using them. We propose simple scheme involving sequence indicated ordering. show computational cost for all ridge used no more than single fit regression, describe strategy Lasso makes use previous fits greatly speed up entire models. select final estimator cross-validation provide general result quality best performing test set selected from among number M competing estimators linear setting. Our requires sparsity assumptions shows only $$\log M$$ log M price incurred compared unknown estimator. demonstrate effectiveness our approach applied missing corrupted data, An R package github.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On robust regression with high-dimensional predictors.

We study regression M-estimates in the setting where p, the number of covariates, and n, the number of observations, are both large, but p ≤ n. We find an exact stochastic representation for the distribution of β = argmin(β∈ℝ(p)) Σ(i=1)(n) ρ(Y(i) - X(i')β) at fixed p and n under various assumptions on the objective function ρ and our statistical model. A scalar random variable whose determinist...

متن کامل

Faithful Variable Screening for High-dimensional Convex Regression By

We study the problem of variable selection in convex nonparametric regression. Under the assumption that the true regression function is convex and sparse, we develop a screening procedure to select a subset of variables that contains the relevant variables. Our approach is a two-stage quadratic programming method that estimates a sum of one-dimensional convex functions, followed by one-dimensi...

متن کامل

Variable Importance Assessment in Regression: Linear Regression versus Random Forest

Relative importance of regressor variables is an old topic that still awaits a satisfactory solution. When interest is in attributing importance in linear regression, averaging over orderings methods for decomposing R2 are among the state-of-theart methods, although the mechanism behind their behavior is not (yet) completely understood. Random forests—a machinelearning tool for classification a...

متن کامل

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

Constrained Inverse Regression for Incorporating Prior Information

Inverse regression methods facilitate dimension-reduction analyses of high-dimensional data by extracting a small number of factors that are linear combinations of the original predictor variables. But the estimated factors may not lend themselves readily to interpretation consistent with prior information. Our approach to solving this problem is to first incorporate prior information via theor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Statistics and Computing

سال: 2022

ISSN: ['0960-3174', '1573-1375']

DOI: https://doi.org/10.1007/s11222-022-10110-5